class: center, middle, inverse, title-slide # STATS 220 ## Introduction to
.small[the language of data analysis] ### Earo Wang --- ## About me * 🎓 I earned PhD (Stats) @ Monash University, Australia * ❤️ My research interests lie in exploratory data analysis, data visualisation, software design, ... * 👩💻 That said, I turn ☕ into `#rstats` 📦 * Outside of work, I play 🎾 & 🏓 <br> <br> <br> <br> * 🔍 Where to find me + <i class="fas fa-globe"></i> [earo.me](https://earo.me) + <i class="fab fa-github"></i> [@earowang](https://github.com/earowang) + <i class="fab fa-twitter"></i> [@earowang](https://twitter.com/earowang) ??? Embed my packages logos --- class: inverse middle center ## Data + Technology --- ## What do I mean by "data"? -- <img src="../img/iris-mtcars-meme.jpg" width="65%" style="display: block; margin: auto;" /> ??? How many of you have seen and worked with datasets, such as *iris*, *mtcars*, *wages*? --- ## How do I learn about XX "technology"? * Documentation! Documentation! Documentation! * Get your hands dirty * Find the community, #rstats on twitter * Ask questions on stack overflow and rstudio community * Google is your best friend: error message Where XX is not limited to R, but python, javascript, etc. --- class: inverse middle ## .center[Why <i class='fab fa-r-project'></i>] * A general-purpose of programming language * Originated by statisticians, a language for statistical computing and graphics * [15,395 + packages on CRAN](https://cran.r-project.org/web/packages/), Github, and others * The tidyverse, a domain specific language in R for data scientists --- .left-column[ ## What R can do? ### - for fun ] .right-column[ ### 📦 `memer` for creating memes ```r # remotes::install_github("sctyner/memer") library(memer) meme_get("DistractedBf") %>% meme_text_distbf("data science", "new students", "statistics") ``` <img src="figure/memer-1.png" width="80%" style="display: block; margin: auto;" /> ] --- .left-column[ ## What R can do? ### - for fun ### - for data ] .right-column[ * add the image of hadley's ds workflow ] --- .left-column[ ## What R can do? ### - for fun ### - for data ### - for writing ] .right-column[ * `rmarkdown` for assignments/reports/paper in `.html` and `.pdf` * `blogdown` for blogs * `bookdown` for books * `xaringan` for slides, for example the lecture slides you're reading now leverage what you learned with HTML and CSS to make the presenation in your own style. ] --- class: inverse middle ## .center[Why RStudio] * An IDE * Working environment for R * Project workflow --- ## RStudio IDE * image about panes ??? default settings 1 minutes to choose your favourite theme --- ## R Project `.Rproj` * Easy to share and collaborate * No `setwd()`, Jenny Bryan will set fire on your computer [a link] --- ## Directory structure Under `stats220` * `data/`: * `data1.csv` * `data2.xsl` * `assignments/`: * --- class: inverse middle ## <i class='fab fa-r-project'></i> as general-purpose programming language --- .left-column[ ## R basics ### - access to RAM ] .right-column[ Store values temporarily in computer memory ```r akl_lon <- 174.76 akl_lat <- -36.85 ``` ⬆️ They are .red[assignments]. * left-hand side: .red[variable names] or .red[symbols] starts with a letter. .checked[+ `akl_lon` (good practice)] .x[+ `akl.lon` & `aklLon` (not recommended)] * assignment operator: .checked[+ `<-`] .x[+ `=` & `->`: `174.76 -> akl_lon`] * right-hand side: .red[values] ] --- .left-column[ ## R basics ### - access to RAM ] .right-column[ Retrieve values from computer memory ```r akl_lon ``` ``` #> [1] 174.76 ``` ```r akl_lat ``` ``` #> [1] -36.85 ``` ] --- .left-column[ ## R basics ### - access to RAM ### - access to CPU ] .right-column[ Perform operations or calculations like arithmetic and comparisons ```r akl_lon_region <- akl_lon + c(-1, 1) akl_lat_region <- akl_lat + c(-.5, .5) akl_lon_region ``` ``` #> [1] 173.76 175.76 ``` ```r akl_lat_region ``` ``` #> [1] -37.35 -36.35 ``` ] --- .left-column[ ## R basics ### - access to RAM ### - access to CPU ### - access to mass storage ] .right-column[ Read data files from hard distks, USB sticks, etc into RAM ```r library(sf) akl_bus <- st_read("data/Bus_Route/Bus_Route.shp") ``` ``` #> Reading layer `Bus_Route' from data source `/Users/wany568/Teaching/stats220/data-tech/lectures/data/Bus_Route/Bus_Route.shp' using driver `ESRI Shapefile' #> Simple feature collection with 496 features and 7 fields #> geometry type: MULTILINESTRING #> dimension: XY #> bbox: xmin: 1727652 ymin: 5859539 xmax: 1787138 ymax: 5982575 #> epsg (SRID): 2193 #> proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ``` ] --- .left-column[ ## R basics ### - access to RAM ### - access to CPU ### - access to mass storage ### - access to screen ] .right-column[ Print out results ```r akl_bus[1:4, ] ``` ``` #> Simple feature collection with 4 features and 7 fields #> geometry type: MULTILINESTRING #> dimension: XY #> bbox: xmin: 1751253 ymin: 5915245 xmax: 1758019 ymax: 5921401 #> epsg (SRID): 2193 #> proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs #> OBJECTID ROUTEPATTE AGENCYNAME ROUTENAME #> 1 304675 02005 NZB St Lukes To Wynyard Quarter Via Kingsland #> 2 304676 02006 NZB Wynyard Quarter To St Lukes Via Kingsland #> 3 304677 02209 NZB Avondale To City Centre Via New North Rd #> 4 304678 02208 NZB City Centre To Avondale Via New North Rd #> ROUTENUMBE MODE Shape__Len geometry #> 1 20 Bus 8042.190 MULTILINESTRING ((1755487 5... #> 2 20 Bus 7919.198 MULTILINESTRING ((1756321 5... #> 3 22A Bus 11428.889 MULTILINESTRING ((1757613 5... #> 4 22A Bus 11606.254 MULTILINESTRING ((1757346 5... ``` ] --- .left-column[ ## R basics ### - access to RAM ### - access to CPU ### - access to mass storage ### - access to screen ] .right-column[ Produce visual displays ```r library(ggplot2) ggplot() + geom_sf(data = akl_bus, aes(colour = AGENCYNAME)) ``` <img src="figure/screen-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .left-column[ ## R basics ### - access to RAM ### - access to CPU ### - access to mass storage ### - access to screen ### - access to network ] .right-column[ Access data from remote computers, including web servers on the internet .panelset[ .panel[.panel-name[R Code] ```r # https://data-atgis.opendata.arcgis.com/datasets/bus-route/data?geometry=169.841%2C-37.610%2C179.685%2C-36.072 library(leaflet) leaflet(data = st_transform(akl_bus, crs = 4326)) %>% addTiles() %>% addPolylines( weight = 2, popup = ~ paste("Routenumber:", ROUTENUMBE) ) ``` A sequence of function calls ] .panel[.panel-name[Map]